Patterns and determinants of the global herbivorous mycobiome

Despite their role in host nutrition, the anaerobic gut fungal (AGF) component of the herbivorous gut microbiome remains poorly characterized. Here, to examine global patterns and determinants of AGF diversity, we generate and analyze an amplicon dataset from 661 fecal samples from 34 mammalian species, 9 families, and 6 continents. We identify 56 novel genera, greatly expanding AGF diversity beyond current estimates (31 genera and candidate genera). Community structure analysis indicates that host phylogenetic affiliation, not domestication status and biogeography, shapes the community rather than. Fungal-host associations are stronger and more specific in hindgut fermenters than in foregut fermenters. Transcriptomics-enabled phylogenomic and molecular clock analyses of 52 strains from 14 genera indicate that most genera with preferences for hindgut hosts evolved earlier (44-58 Mya) than those with preferences for foregut hosts (22-32 Mya). Our results greatly expand the documented scope of AGF diversity and provide an ecologically and evolutionary-grounded model to explain the observed patterns of AGF diversity in extant animal hosts.

Briefly describe the study type including whether data are quantitative, qualitative, or mixed-methods (e.g. qualitative cross-sectional, quantitative experimental, mixed-methods case study).
State the research sample (e.g. Harvard university undergraduates, villagers in rural India) and provide relevant demographic information (e.g. age, sex) and indicate whether the sample is representative. Provide a rationale for the study sample chosen. For studies involving existing datasets, please describe the dataset and source.
Describe the sampling procedure (e.g. random, snowball, stratified, convenience). Describe the statistical methods that were used to predetermine sample size OR if no sample-size calculation was performed, describe how sample sizes were chosen and provide a rationale for why these sample sizes are sufficient. For qualitative data, please indicate whether data saturation was considered, and what criteria were used to decide that no further sampling was needed.
Provide details about the data collection procedure, including the instruments or devices used to record the data (e.g. pen and paper, computer, eye tracker, video or audio equipment) whether anyone was present besides the participant(s) and the researcher, and whether the researcher was blind to experimental condition and/or the study hypothesis during data collection.
Indicate the start and stop dates of data collection. If there is a gap between collection periods, state the dates for each sample cohort.
If no data were excluded from the analyses, state so OR if data were excluded, provide the exact number of exclusions and the rationale behind them, indicating whether exclusion criteria were pre-established.
State how many participants dropped out/declined participation and the reason(s) given OR provide response rate OR state that no participants dropped out/declined participation.
If participants were not allocated into experimental groups, state so OR describe how participants were allocated to groups, and if allocation was not random, describe how covariates were controlled.
We studied the gut mycobiome (Phylum Neocallimastigomycota) of herbivorous mammals. We tested the effect of animal host phylogeny, animal gut type, and animal domestication status on the gut mycobiome. Other covariates included sex, age, and geographical location. The total number of experimental units is 661. The replicates per treatment, when considering host species as treatments, varied from 1 to 229.
Feces from an individual animal was considered a single sample. Samples from individuals belonging to the same host species were considered biological replicates of that species. Fresh, moist fecal samples originating from a single animal were scooped immediately post-defecation into sterile 50-ml plastic tubes and rapidly sealed.
All individuals collecting samples from domesticated animals were owners or caretakers and were authorized and trained in animal keeping and husbandry. For domesticated animals reared in a research institution, all IRB protocols for animal rearing were observed. Sampling wild herbivores was conducted through a partnership with hunting communities, alleviating the need for obtaining hunting permits. All hunters had the appropriate licenses, and the animals were shot on public land during the hunting season. No sample size calculation was performed. Sample sizes were chosen based on sample availability. The sample sizes were sufficient to decouple the association of host evolutionary history with AGF diversity, which is the major goal of this work.
Sampling metadata was collected by sampling teams at the respective locations and reported by email to the study team who aggregated metadata in spreadsheets. Sampling data was collected by:John and Maria Jinks, James and Tammy  Sampling was conducted from September 2021 and September 2022. Samples originated predominantly from USA and Europe. However, in order to cover as much vertebrate and geographical diversity as possible, many samples were also taken from other countries (see the metadata provided with the manuscript). The frequency of sampling and the replication per animal species was dependent on sample and manpower availability.
A small subset of samples were excluded from all analyses due to not enough sampling depth of the microbiome. For comparison of alpha diversity between samples, patterns were assessed in samples with at least 1000 sequences (n=421 samples).The importance of various factors (host-associated factors, e.g., gut type, animal family, or animal species; domestication status, and biogeography) in shaping the observed patterns of alpha diversity was examined using ANOVA. Only samples that have at least 10 replicates (at any of these host factor levels) were included in the analysis. These included foregut and hindgut (for the gut type factor comparison), families Bovidae, Cervidae, and Equidae (for the animal family comparison), cows, goats, sheep, deer, and horses (for the animal

Authentication
Mycoplasma contamination genus comparison), and domesticated and non-domesticated (for domestication status comparison). We opted to carry out comparisons of the effect of biogeography, age, and sex only on samples belonging to the same animal species (only the four mostsampled animals were included) in an attempt to control for other hostassociated factors that might conflate the results. Biogeography comparisons were conducted on cattle, goat, sheep, and horse datasets originating from the USA, Egypt, Germany, Italy, Austria, Czech Republic, New Zealand, and Argentina.
We assessed each question with multiple analyses and compared our results to previous studies in order to assess reproducibility.
We randomly subsampled one individual per species, repeating this procedure a total of 100 times, and using the 95% quartile of significance values for each individual subsample to assess overall significance.
No blinding was used, given that this study is not set up like a clinical trial.
Describe the study conditions for field work, providing relevant parameters (e.g. temperature, rainfall).
State the location of the sampling or experiment, providing relevant parameters (e.g. latitude and longitude, elevation, water depth).
Describe the efforts you have made to access habitats and to collect and import/export your samples in a responsible manner and in compliance with local, national and international laws, noting any permits that were obtained (give the name of the issuing authority, the date of issue, and any identifying information).
Describe any disturbance caused by the study and how it was minimized.
Describe all antibodies used in the study; as applicable, provide supplier name, catalog number, clone name, and lot number.
Describe the validation of each primary antibody for the species and application, noting any validation statements on the manufacturer's website, relevant citations, antibody profiles in online databases, or data provided in the manuscript.
State the source of each cell line used and the sex of all primary cell lines and cells derived from human participants or vertebrate models.
Describe the authentication procedures for each cell line used OR declare that none of the cell lines used were authenticated.
Confirm that all cell lines tested negative for mycoplasma contamination OR describe the results of the testing for mycoplasma contamination OR declare that the cell lines were not tested for mycoplasma contamination.

March 2021
Commonly misidentified lines (See ICLAC register) Palaeontology and Archaeology Specimen provenance

Specimen deposition
Dating methods Tick this box to confirm that the raw and calibrated dates are available in the paper or in Supplementary Information.

Ethics oversight
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Animals and other research organisms
Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research, and Sex and Gender in Research Laboratory animals

Wild animals
Reporting on sex Field-collected samples

Ethics oversight
Note that full information on the approval of the study protocol must also be provided in the manuscript.

Clinical data Policy information about clinical studies
All manuscripts should comply with the ICMJEguidelines for publication of clinical research and a completedCONSORT checklist must be included with all submissions.

Clinical trial registration
Study protocol

Data collection
Outcomes Dual use research of concern Policy information about dual use research of concern

Hazards
Could the accidental, deliberate or reckless misuse of agents or technologies generated in the work, or the application of information presented in the manuscript, pose a threat to: Name any commonly misidentified cell lines used in the study and provide a rationale for their use.
Provide provenance information for specimens and describe permits that were obtained for the work (including the name of the issuing authority, the date of issue, and any identifying information). Permits should encompass collection and, where applicable, export.
Indicate where the specimens have been deposited to permit free access by other researchers.
If new dates are provided, describe how they were obtained (e.g. collection, storage, sample pretreatment and measurement), where they were obtained (i.e. lab name), the calibration program and the protocol for quality assurance OR state that no new dates are provided.
Identify the organization(s) that approved or provided guidance on the study protocol, OR state that no ethical approval or guidance was required and explain why not.
For laboratory animals, report species, strain and age OR state that the study did not involve laboratory animals.
Provide details on animals observed in or captured in the field; report species and age where possible. Describe how animals were caught and transported and what happened to captive animals after the study (if killed, explain why and describe method; if released, say where and when) OR state that the study did not involve wild animals.
Indicate if findings apply to only one sex; describe whether sex was considered in study design, methods used for assigning sex. Provide data disaggregated for sex where this information has been collected in the source data as appropriate; provide overall numbers in this Reporting Summary. Please state if this information has not been collected. Report sex-based analyses where performed, justify reasons for lack of sex-based analysis.
For laboratory work with field-collected samples, describe all relevant parameters such as housing, maintenance, temperature, photoperiod and end-of-experiment protocol OR state that the study did not involve samples collected from the field.
Identify the organization(s) that approved or provided guidance on the study protocol, OR state that no ethical approval or guidance was required and explain why not.
Provide the trial registration number from ClinicalTrials.gov or an equivalent agency.
Note where the full trial protocol can be accessed OR if not available, explain why.
Describe the settings and locales of data collection, noting the time periods of recruitment and data collection.
Describe how you pre-defined primary and secondary outcome measures and how you assessed these measures.

March 2021
No Yes

Experiments of concern
Does the work involve any of these experiments of concern: No Yes Confirm that both raw and final processed data have been deposited in a public database such as GEO.
Confirm that you have deposited or provided access to graph files (e.g. BED files) for the called peaks.

Data access links
May remain private before publication.

Files in database submission
Genome browser session For "Initial submission" or "Revised version" documents, provide reviewer access links. For your "Final submission" document, provide a link to the deposited data.
Provide a list of all files available in the database submission.
Provide a link to an anonymized genome browser session for "Initial submission" and "Revised version" documents only, to enable peer review. Write "no longer applicable" for "Final submission" documents.
Describe the experimental replicates, specifying number, type and replicate agreement.
Describe the sequencing depth for each experiment, providing the total number of reads, uniquely mapped reads, length of reads and whether they were paired-or single-end.
Describe the antibodies used for the ChIP-seq experiments; as applicable, provide supplier name, catalog number, clone name, and lot number.
Specify the command line program and parameters used for read mapping and peak calling, including the ChIP, control and index files used.
Describe the methods used to ensure data quality in full detail, including how many peaks are at FDR 5% and above 5-fold enrichment.
Describe the software used to collect and analyze the ChIP-seq data. For custom code that has been deposited into a community repository, provide accession details.

March 2021
Flow Cytometry Plots Confirm that: The axis labels state the marker and fluorochrome used (e.g. CD4-FITC).
The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers).
All plots are contour plots with outliers or pseudocolor plots.
A numerical value for number of cells or percentage (with statistics) is provided.